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6.4 Krylov Subspaces and Conjugate Gradients 

Our original equation is Ax = b. The preconditioned equation is P~^Ax = P~^b. 
When we write P~^, we never intend that an inverse will be explicitly computed. P 
may come from Incomplete LU, or a few steps of a multigrid iteration, or "domain 
decomposition." Entirely new preconditioners are waiting to be invented. 

The residual is = b — Ax^. This is the error in Ax = b, not the error in x 
itself. An ordinary preconditioned iteration corrects Xk by the vector P~^rk'. 

Pxk+i = {P-A)xk + b or Pxk+i = Pxk + Tk or Xk+i = Xk + P~^rk ■ (1) 

In describing Krylov subspaces, I should work with P^^A. For simplicity I will 
only write A ! I am assuming that P has been chosen and used, and the precondi- 
tioned equation P~^Ax = P~^b is given the notation Ax = b. The preconditioner is 
now P = L Our new A is probably better than the original matrix with that name. 

With xi = b, look first at two steps of the pure iteration x^+i = (/ — A)xj + b: 
X2 = (I - A)b + b = 2b- Ab X3 = {I - A)xi + b = 3b - 3Ab + A'^b. (2) 

My point is simple but important: xj is a combination of b, Ab, . . . , A^~^b. 

We can compute those vectors quickly, multiplying each time by a sparse A. Ev- 
ery iteration involves only one matrix-vector multiplication. Krylov gave a name 
to all combinations of those vectors, and he suggested that there might be better 
combinations than the particular choices xj in (2). 

Usually a different combination will come closer to the desired x = A~^b. 

Krylov subspaces 

The linear combinations of b, Ab, . . . , A^~^b form the jth Krylov subspace. This space 
depends on A and b. Following convention, I will write JCj for that subspace and K.j 
for the matrix with those basis vectors in its columns: 

Krylov matrix = \ b Ab A^b . . . A^~^b 1 . 

. . ^ ._ (3) 

Krylov subspace Kj = all combinations of b, Ab, . . . , A^ ^b. 

Thus ICj is the column space of Kj. We want to choose the best combination as 
our improved Xj. Various definitions of "best" will give various Xj. Here are four 
different approaches to choosing a good Xj in Kj — this is the important decision: 

1. The residual rj = b — Axj is orthogonal to Kj (Conjugate Gradients). 

2. The residual rj has minimum norm for xj in JCj (GMRES and MINRES). 

3. Tj is orthogonal to a different space Kji^A^) (BiConjugate Gradients). 
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4. The error ej has minimum norm (SYMMLQ). 

In every case we hope to compute the new Xj quickly from the earher x's. If that 
step only involves Xj^i and Xj_2 (short recurrence) it is especially fast. Short 
recurrences happen for conjugate gradients and symmetric positive definite A. The 
BiCG method is a natural extension of short recurrences to unsymmetric A (using 
two Krylov spaces). A stabilized version called BiCGStab chooses xj in A^K.j{A^). 

As always, computing be very unstable until we choose a decent basis. 



Vandermonde Example 



To follow each step of orthogonalizing the basis, and solving Ax = b hj conjugate 
gradients, we need a good example. It has to stay simple ! I am happy with this one: 



A 



"1 




"l" 


2 




1 




b = 


3 




1 


4 




1 



Ab 



A-^b 



1/1' 
1/2 

1/3 

1/4 



(4) 



That constant vector b spans the Krylov subspace K^. Then Ab, A%, and A% are 
the other basis vectors in IC^. They are the columns of K4, which we will name V: 



Vandermonde matrix 



V 



1111 
12 4 8 
1 3 9 27 
1 4 16 64 



(5) 



Those columns are constant, linear, quadratic, and cubic. The column vectors are 
independent but not at all orthogonal. The best measure of non-orthogonality starts 
by computing inner products of the columns in the matrix V^V . When columns are 
orthonormal, their inner products are or 1. (The matrix is called Q, and the inner 
products give Q^Q = I.) Here V'^V is far from the identity matrix! 



V^V 



4 10 30 100 

10 30 100 354 

30 100 354 1300 

100 354 1300 4890 



10 
30 
100 
1300 



1 +2 +3 +4 

12 + 22 + 32 + 42 

13 + 23 + 33 + 43 
1^ + 2^ + 3^ + 45 



The eigenvalues of this inner product matrix [Gram matrix) tell us something im- 
portant. The extreme eigenvalues are Amax ~ 5264 and Amin ~ .004. Those are 
the squares of (T4 and ai, the largest and smallest singular values of V. The key 
measure is their ratio cr^/ai, the condition number of V: 



cond(y^V^) 



5264 



10^ cond(V) 



A. 



A. 



1000. 



For such a small example, 1000 is a poor condition number. For an orthonormal basis 
with Q^'^Q = I, all eigenvalues = singular values = condition number = 1. 



6.4. KRYLOV SUBSPACES AND CONJUGATE GRADIENTS ©2006 Gilbert Strang 



We could improve the condition by rescaling the columns of V to unit vectors. 
Then V^V has ones on the diagonal, and the condition number drops to 263. But 
when the matrix size is realistically large, that rescaling will not save us. In fact 
we could extend this Vandermonde model from constant, linear, quadratic, and cubic 
vectors to the functions 1 ^. [A multiplies by x.) Please look at what happens ! 



Continuous Vandermonde matrix 



X X 



]• 



(6) 



Again, those four functions are far from orthogonal. The inner products in VjVc 
change from sums to integrals. Working on the interval from to 1, the integrals are 
Jq x^x^ dx = + j — 1). They appear in the Hilbert matrix: 



Continuous inner products 
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1.5 and A 



(7) 



„„„ ~ 10"^ As 

always, those are the squares of the singular values cXmax and a^in of K- The condition 
number of the power basis 1, x, x^, x^ is the ratio o"max/o"min ~ 125. If you want a more 
impressive number (a numerical disaster), go up to x^. The condition number of the 
10 by 10 Hilbert matrix is Amax/Amin ~ 10^^, and 1, x, . . . , x^ is a very poor basis. 

To reduce that unacceptably large number, Legendre orthogonalized the basis. 
He chose the interval from —1 to 1, so that even powers would be automatically 
orthogonal to odd powers. The first Legendre polynomials 

Our point is that the Vandermonde matrix example (as we follow it below) will be 
completely parallel to the famous functions of Legendre. 

In particular, the three-term recurrence in the Arnoldi-Lanczos orthogonal- 
ization is exactly like Legendre's classical three-term recurrence for his polynomials. 
They appear for the same reason — the symmetry of A. 



Orthogonalizing the Krylov Basis 

The best basis qi,. . . ,qj for the Krylov subspace JCj is orthonormal. Each new qj 
comes from orthogonalizing t = Aqj_i to the basis vectors gi, . . . , qj^i that are already 
chosen. The iteration to compute these orthonormal g's is Arnoldi's method. 

This method is essentially the Gram-Schmidt idea (called modified Gram-Schmidt 
when we subtract the projections of t onto the g's one at a time, for numerical 
stability). We display one Arnoldi cycle for the Vandermonde example that has 
b=[l 1 1 1]' and A = diag([l 2 3 4]): 



Arnoldi's orthogonalization of b, Ab, . . . , A" ^b: 
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qti = % Normalize b to = 1 qi = [1 1 1 1^/2 
for j = 1, . . . , — 1 % Start computation of g^+i 

1 t = Aqj] % one matrix multiplication Aqi = [1 2 3 4] /2 
for 2 = 1, . . . , j % t is in the space JCj^i 

2 = g^^t; % hijQi = projection of t on qi hn = 5/2 

3 t = t — hijQi; % Subtract that projection t = [—3 —1 1 3]'/4 
end; % t is orthogonal to gi, . . . , 

4 hj^ij = Plh % Compute the length of t h2i = \/b/2 

5 Qj+i = t/hj+ij; % Normalize t to II gj+i II = 1 ^2 = [ — 3 —1 1 3]'/^/20 
end % qi, . . . ,qn are orthonormal 



You might like to see the four orthonormal vectors in the Vandermonde example. 
Those columns gi, g2, (Js, 94 of Q are still constant, linear, quadratic, and cubic. I 
can also display the matrix H of numbers hij that produced the g's from the Krylov 
vectors b, Ab, A^b, A^b. (Since Arnoldi stops at j = — 1, the last column of H is 
not actually computed. It comes from a final command H{ : , ra) = Q' * A * Q{: , n).) 
This H turns out to be symmetric and tridiagonal, and we will look for a reason. 



Arnoldi's method for the Vandermonde example V gives Q and H: 



Q 



-3 1 

-1 -1 

1 -1 

3 1 



-1 
3 

-3 
1 



2 ^20 2 720 



H 



■ 5/2 V5/2 
V^/2 5/2 

V^80 5/2 ^/aE 

yi5 5/2 . 



Please notice that H is not upper triangular. The usual QR factorization of the 
original Krylov matrix K (which was V in our example) has this same Q, but Arnoldi 
does not compute R. Even though the underlying idea copies Gram-Schmidt (at every 
step gj+i is a unit vector orthogonal to the previous j columns), there is a difference. 
The vector t that Arnoldi orthogonalizes against the previous gi, . . . , g^ is t = Aqj. 
This is not column j ' + 1 of K, as in Gram-Schmidt. Arnoldi is factoring AQ ! 



Arnoldi factorization AQ = QH for the final subspace JCr 



AQ 



Aqi 



Aq„ 



hn 


hi2 


h 


h2i 


h22 


h 





h23 










h, 



2n 



This matrix H is upper triangular plus one lower diagonal, which makes it ^^upper 
Hessenberg." The hij in step 2 go down column j as far as the diagonal. Then 
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hj+ij in step 4 is below the diagonal. We check that the first column of AQ = QH 
(multiplying by columns) is Arnoldi's first cycle that produces q2'. 

Column 1 Aqi = huqi + h2iq2 which is q2 = {Aqi - /iiigi)//i2i • (9) 

That subtraction is step 3 in Arnoldi's algorithm. The division by /i2i is step 5. 

Unless more of the hij are zero, the cost is increasing at every iteration. The 
vector updates in step 3 for j = 1, . . . , n — 1 give nearly updates and flops. 
A short recurrence means that most of these hij are zero, and the count of floating 
point operations drops to O(ra^). That happens when A = A"^. 



Arnoldi Becomes Lanczos 

The matrix H is symmetric and therefore tridiagonal when A is symmetric. 

This fact is the foundation of conjugate gradients. For a matrix proof, multiply 
AQ = QH by Q^. The left side Q^AQ is always symmetric when A is symmetric. 
Since H has only one lower diagonal, it has only one upper diagonal. This tridiagonal 
H has only three nonzeros in its rows and columns. So computing g^+i only involves 
qj and qj-i- 

Arnoldi when A = Aqj = hj^ij g^+i + hjj qj + hj-ij qj-i . (10) 

This is the Lanczos iteration. Each new g^+i = {Aqj — hjjqj — hj^ijqj^i)/hj^ij 
involves one multiplication Aqj, two dot products for /I's, and two vector updates. 

Allow me an important comment on the symmetric eigenvalue problem Ax = Xx. 
We have seen that H = Q^AQ is tridiagonal, and Q^ = Q~^ from the orthogonality 
Q"^Q = I. The matrix H = Q~^AQ has the same eigenvalues as A: 

Same A Hy = Q'^AQy = \y gives Ax = Xx with x = Qy . (11) 

It is much easier to find the eigenvalues A for a tridiagonal H than for the original A. 

For a large symmetric matrix, we often stop the Arnoldi-Lanczos iteration at a 
tridiagonal Hk with k < n. The full n-step process to reach Hn is too expensive, 
and often we don't need all n eigenvalues. So we compute the k eigenvalues of Hk 
instead of the n eigenvalues of H. These computed Ai^, . . . , A^fc (sometimes called 
Ritz values) can provide good approximations to the first k eigenvalues of A. And 
we have an excellent start on the eigenvalue problem for Hk+i, if we decide to take a 
further step. 

This Lanczos method will find, approximately and iteratively and quickly, the 
leading eigenvalues of a large symmetric matrix. For its inner loop (the eigenvalues 
of the tridiagonal Hk) we use the "Q-R method" described in section . 
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The Conjugate Gradient Method 

We return to iterative methods for Ax = b. The Arnoldi algorithm produced or- 
thonormal basis vectors qi,q2, ■ ■ ■ for the growing Kryfov subspaces /Ci, /C2, .... Now 
we select vectors Xk in Kk that approach the exact solution to Ax = b. 

We concentrate on the conjugate gradient method when A is symmetric positive 
definite. Symmetry gives a short recurrence. Definiteness prevents division by zero. 

The rule for x^ in conjugate gradients is that the residual r^ = b — Axk should 
be orthogonal to all vectors in /Cfc. Since will be in A3fc_|_i, it must be a multiple 
of Arnoldi's next vector q^+i ! Each residual is therefore orthogonal to all previous 
residuals (which are multiples of the previous g's): 

Orthogonal residuals rjrk = for i < k . (12) 

The difference between and g^+i is that the g's are normalized, as in qi = b/\\b\\. 

Similarly r^-i is a multiple of qt- Then the difference — Vk-i is orthogonal to 
each subspace JCi with i < k. Certainly Xi —Xi-i lies in that JCi. So Ar is orthogonal 
to earlier Ax's: 

{xi - Xi+i)^{rk - rk-i) = for i<k. (13) 
These differences Ax and Ar are directly connected, because the 6's cancel in Ar: 

rk - Tk^i = {b- Axk) - (6 - Axk-i) = -A{xk - Xk-i) . (14) 
Substituting (14) into (13), the updates Ax are "A- orthogonal" or conjugate: 
Conjugate directions [xi — Xi^ij^ A{xk — Xk-i) = for i < k . (15) 

Now we have all the requirements. Each conjugate gradient step ends with a 
"search direction" dk-i for the next update Xk — Xk~i- Steps 1 and 2 compute the 
correct multiple akdk-i to move to Xk- Using (14), step 3 finds the new r^. Steps 4 
and 5 orthogonalize r^ against the search direction just used, to find the next dk- 

The constants 13k in the search direction and ak in the update come from (12) 
and (13) for i = k — 1. For symmetric A, orthogonality will be automatic for i < k — 1, 
as in Arnoldi. We have a "short recurrence" for the new Xk and r^. 

Here is one cycle of the algorithm, starting from xq = and r^ = b and do = r^. 
Steps 1 and 3 involve the same matrix-vector multiplication Ad. 

Conjugate Gradient Method for Symmetric Positive Definite A 
Example: A = diag([l 2 3 4]) and 6= [1 1 1 1]' 



1 Qifc = r]:_-^rk-i/d1_-j^Adk-i % Step length to next Xk ct^ = 4/10 = 2/5 

2 Xk = Xk-i + akdk-i % Approximate solution x^ = [2 2 2 2]'/5 

3 r^ = rk-i — ctkAdk-i % New residual from (14) ri = [3 1 —1 —3]'/5 

4 /3fc = r]:rk/r1_^rk-i % Improvement this step f3\ = 1/5 

5 dk = Tk + Pkdk-i % Next search direction di = [4 2 — 2]'/5 
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The formulas for and l3k are explained briefly below — and fully by Trefethen-Bau [-] 
and Shewchuk [-] and many other good references. 

When there is a preconditioner P (to use even fewer CG steps for an accurate x), 
step 3 uses P~^A and the inner products in steps 1 and 4 include an extra factor P^^. 



Different Viewpoints on Conjugate Gradients 

1 want to describe the (same!) conjugate gradient method in two different ways: 

1. It solves a tridiagonal system Hy = f recursively, for Arnoldi's H. 

2. It minimizes the energy ^x'^Ax — x^b recursively. This is important. 



How does Ax = b change to the tridiagonal Hy = / ? Those are connected by 
Arnoldi's orthonormal columns gi, . . . , g„ in Q, with = and Q^AQ = H: 



Ax = b is {Q''AQ){Q''x) = Q^b which is Hy = f 



0,...,0). (16) 



Since qi is fe/ll^ll; ^^st component of / = Q'^b is q^b = \\b\\. The other components 
of / are qjb = because qi is orthogonal to gi. The conjugate gradient method is 
implicitly computing the symmetric tridiagonal H. When the method finds Xk, it 
also finds yk = Qjxk (but it doesn't say so). Here is the third step: 



Tridiagonal system Hy = / 
Implicitly solved by CG 



hn hi2 

^21 ^22 ^23 
^32 ^33 









'm' 














(17) 



This is the equation Ax = b projected by onto the third Krylov subspace /C3. 

These /I's never appear in conjugate gradients. We don't want to do Arnoldi too! 
It is the LDL^ factors of H that CG is somehow computing — two new numbers a 
and (3 at each step. Those give a fast update from yk~i to yk- The iterates Xk = QkUk 
from conjugate gradients approach the exact solution x„ = QnUn which is x = A~^b. 



Energy By seeing conjugate gradients as an energy minimizing algorithm, we can 
extend it to nonlinear problems and use it in optimization. For our linear equation 
Ax = b, the energy is E{x) = ^x'^Ax — x'^b. Minimizing E{x) is the same as solving 
Ax = b, when A is positive definite — this was the main point of Section 1.6. The 
CG iteration minimizes E{x) on the growing Krylov subspaces. 

The first subspace /Ci is the line in the direction = tq = b. Minimization of 
the energy E{x) for the vectors x = ab produces the number ai. 

1 b^b 
E{ab) = -a^b^ Ab — ab^b is minimized at ai = . (18) 



This ai is the constant chosen in step 1 of the first conjugate gradient cycle. 



©2006 Gilbert Strang 



The gradient of E( "^Ax — x^h is exactly Ax — h. The steepest descent 

direction at xi is along the negative gradient, which is ri ! This sounds hke the 
perfect direction di for the next move. But the great difficuhy with steepest descent 
is that this ri can be too close to the first direction do. Little progress that way. So 
step 5 adds the right multiple [3idQ, in order that the new di = ri + Pido will be 
A-orthogonal to the first direction do. 

Then we move in this conjugate direction di to X2 = xi + 0:2(^1. This explains 
the name conjugate gradients. The pure gradients of steepest descent would be too 
nearly parallel, and we would take small steps across a valley instead of a good step 
to the bottom (the minimizing x in Figure 6.15). Every cycle of CG chooses ak to 
minimize E{x) in the new search direction x = Xk^i + adk-i. The last cycle (if we go 
that far) gives the overall minimizer Xn = x = A^^b. 



Figure 6.14: Steepest descent vs. conjugate gradient. 



The main point is always this. When you have orthogonality, projection 
and minimizations can be computed one direction at a time. 



Example 



Suppose Ax = b 



IS 



"211" 




3 ■ 




" 4 " 


12 1 




-1 







1 1 2 




-1 








From xq = (0, 0, 0) and tq = do = b the first cycle gives ai = | and xi = \b = (2, 0, 0). 
The new residual is ri = 6 — Axi = (0, —2, —2). Then the CG algorithm yields 



di 



«2 = 

16 



X2 



3 
-1 
-1 



A-^b\ 



The correct solution is reached in two steps, where normally CG will take n = 3 steps. 
The reason is that this particular A has only two distinct eigenvalues 4 and 1. In that 
case A~^b is a combination of b and Ab, and this best combination X2 is found at cycle 2. 
The residual r2 is zero and the cycles stop early — very unusual. 
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Energy minimization leads in [ ] to an estimate of the convergence rate for the 
error e = x — in conjugate gradients, using the A-norm \\e\\A = V e^Ae: 

)^ 

Error estimate \\x — Xk\\A ^ \ —?== I — XqUa • (19) 

This is the best-known error estimate, although it doesn't account for any clustering 
of the eigenvalues of A. It involves only the condition number Amax/Amin- Prob- 
lem gives the "optimal" error estimate but it is not so easy to compute. That 

optimal estimate needs all the eigenvalues of A, while (19) uses the extreme eigenval- 
ues Aniax(^) and Xmin{A) — which in practice we can bound above and below. 



Minimum Residual Methods 



When A is not symmetric positive definite, CG is not guaranteed to solve Ax = 
b. We lose control of (f-Ad in computing a. We will follow van der Vorst [ ] in 
briefly describing the minimum norm residual approach, leading to MINRES 
and GMRES. 

These methods choose xj in the Krylov subspace Kj so that — is minimal. 
The first orthonormal vectors qi, . . . ,qj go in the columns of Qj, so QjQj = I. As 
in (16) we set Xj = Qjy, to express the solution as a combination of those g's: using (8) 
is 



Norm of residual 



\b-AQjy\ 



'ju — u^ AxjW = \\b - AQjuW = \\b - Qj+iHj+ijy\\. (20) 

Here I used the first j columns of Arnoldi's formula AQ = QH . Since the jth column 
of H is zero after entry j + 1, we only need j ' + 1 columns of Q on the right side: 



First i columns oi QH = \^ qi ■ ■ ■ qj+i ] 



hn 
hi2 



h 



(21) 



The norm in (20) is not changed when we multiply by Qj+i- Our problem becomes: 



Choose y to minimize 



H. 



j+i,jy\ 



(22) 



This is an ordinary least squares problem with only j + 1 equations and j unknowns. 
The right side is (||ro||,0, . . . , 0) as in (16). The rectangular matrix -ffj+ij is 

Hessenberg in (21). We face a completely typical problem of numerical linear algebra: 
Use zeros in H and Qj^ib to find a fast algorithm that computes y. The two favorite 
algorithms for this least squares problem are closely related: 



MINRES A is symmetric (likely indefinite, or we use CG) and H is tridiagonal. 
GMRES A is not symmetric and the upper triangular part of H can be full. 
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In both cases we want to clear out that nonzero diagonal below the main diagonal of 
H. The natural way to do that, one entry at a time, is by ^^Givens rotations.''^ These 
plane rotations are so useful and simple (the essential part is only 2 by 2) that we 
complete this section by explaining them. 



Givens Rotations 

The direct approach to the least squares solution of Hy = f constructs the normal 
equations H^Hy = H^f. That was the central idea in Chapter 1, but you see what 
we lose. If H is Hessenberg, with many good zeros, H^H is full. Those zeros in H 
should simplify and shorten the computations, so we don't want the normal equations. 

The other approach to least squares is by Gram-Schmidt. We factor H into 
orthogonal times upper triangular. Since the letter Q is already used, the or- 
thogonal matrix will be called G (after Givens). The upper triangular matrix is 
G^^H. The 3 by 2 case shows how a rotation in the 1-2 plane can clear out h2i'- 



(23) 



That bold zero entry requires hu sin 6 = h2i cos 6, which determines the rotation 
angle 9. A second rotation Gj^^, in the 2-3 plane, will zero out the 3,2 entry. Then 
^32^21^ is a square upper triangular matrix U above a row of zeros! 

The Givens orthogonal matrix is 6*21^32 but there is no reason to do this multi- 
plication. We use each Gij as it is constructed, to simplify the least squares problem. 
Rotations (and all orthogonal matrices) leave the lengths of vectors unchanged: 
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h32 
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\\Hy-f\\ = \\G,2'G2!Hy - G,,'G,lf\\ 



'u' 




'F' 





y- 


e 



(24) 



This length is what MINRES and GMRES minimize. The row of zeros below U 
means that the last entry e is the error — we can't reduce it. But we get all the other 
entries exactly right by solving the j by j system Uy = F (here j = 2). This gives 
the best least squares solution y. Going back to the original problem of minimizing 
the residual ||r|| = \\b — Axj\\, the best xj in the jth Krylov space is Qjy. 

For non-symmetric A (GMRES rather than MINRES) the recurrence is not short. 
The upper triangle in H can be full, and step j becomes expensive. Possibly it is 
inaccurate as j increases. So we may change "full GMRES" to GMRES (m), which 
restarts the algorithm every m steps. It is not so easy to choose a good m. But 
GMRES is an important algorithm for unsymmetric A. 

Problem Set 6.4 



When the tridiagonal K is preconditioned by P = T (second difference matrix 
with Til = 1) show that T'^K = I + le^ with ej = [1 ... 0]. Start from 
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K = T + eiej. Then T~^K = 1+ {T-^ei)eJ. Verify that T'^ei 



from: 



2 
3 



5 
6 





"1-1 




N 




" 1 " 


T£ = ei = 


-1 2 -1 




N-1 



















-1 2 




1 








Second differences of this hnear vector i are zero. Muhiply / + 
/ — {ieJ)/{N + 1) to estabhsh that this is the inverse matrix K~^T. 



times 



For the model of a square grid with separator down the middle, create the 
reordered matrix K in equation (4). Use spy(i^') to print its pattern of nonzeros. 

Arnoldi expresses each Aqj as hj+ijqj+i + hj^qj + ■ ■ ■ + hijqi. Multiply by qj to 
find hij = qjAqj. If A is symmetric this is {Aqi)^qj. Explain why {Aqi)^qj = 



for 2 < j — 1 by expanding Aqi into + ■ 

recurrence if A = A^ (only hj 



+ hi iqi. We have a short 



j+ij and hjj and hj_ij are nonzero) 



(This is Problem 3 at the matrix level) The Arnoldi equation AQ = QH gives 
H = Q'^AQ = Q^AQ. Therefore the entries of H are hij = qJ Aqj. 

(a) Which Krylov space contains Aqj ? What orthogonality gives hij = when 
i > j + 17 Then H is upper Hessenberg. 

(b) If A^^ = A then hij = {Aqi)'^qj. Which Krylov space contains Aqi ? What 
orthogonality gives hij = when j > i + 17 Now H is tridiagonal. 

Test the pcg(y4, ) MATLAB command on the —1,2,-1 second difference 
matrix A = K. As preconditioner use P = T, when Tn = 1. 

If = [6 Ab ... y4"~^&] is a Krylov matrix with A = A"^, why is the inner 
product matrix K^K a Hankel matrix? This means constant entries down 
each antidiagonal (the opposite of Toeplitz). Show that {K^K)ij depends on 
i + j. 

These are famous names associated with linear algebra (and a lot of other 
mathematics too). All dead. Write one sentence on what they are known for. 



Arnoldi 

Cholesky 

Fourier 

Frobenius 

Gauss 

Gershgorin 

Givens 



Gram 

Hadamard 

Hankel 

Hessenberg 

Hestenes-Stiefel 

Hilbert 

Householder 



Jacobi 

Jordan 

Kronecker 

Krylov 

Lanczos 

Markov 

Schmidt 



Schur 

Schwartz 

Seidel 

Toeplitz 

Vandermonde 

Wilkinson 

Woodbury 



solution: Jacobi (matrices), Gauss (elimination, numerical integration), 
Lanczos [-1,2,-1 with q_l=(l,0,-)] , zeros of cosines/convergence rate?. 
Another q_l? Berresford. 



